\section{Dataset}
\label{sec:dataset}
The original MovieLens dataset contains $100,000$ ratings of \nummovies{}
movies by \numusers{} users and the timestamp that the user made the rating.
The original table given to us contains a row for each rating and columns for
the user id, the movie id, the star rating and the timestamp. For constructing
our model, we only consider the user, movie and rating given. We ignore the
timestamp field to simplify our model, but we do note that correlating time to
ratings could improve the accuracy of our models as seen in
\cite{netflixprize}.

\subsection{Preprocessing of the Dataset}
Since the original dataset as described above has missing entries, we account for
the missing entries by building a sparse matrix using the `sparse' operator of
Matlab. This sparse matrix stores the row and column for a given rating 
and thus effectively helps index into the rating of a movie by a user of the original dataset. 
This sparse matrix is used by the algorithm described in Section \ref{sec:algorithm}.


